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Brain computer interface (BCI) has many useful applications to help 
disabled people that have an active brain with difficulties in movements and 
speaking. One of these applications is the wheelchair, this device is operated 
always by just one user no sharing or borrowing the device. One-user 
applications need features extraction methods with high classification 
accuracy and small training datasets, the variability of the subjects’ mood 
during the recorded sessions and the tiredness during the long sessions are 
serious problems that affect the classification accuracy in these applications. 
Transfer learning can solve the problem, by recording short and separated 
sessions for the same subject in different training times or days. The 
proposed method in this paper uses motor imagery (MI) signals from 
different recorded sessions by one user to build an acceptable size training 
dataset. To regularize different recording sessions, four tuning parameters 
that are independent from each other are generated using a loop, these 


parameters are used to find the ratios of the covariance matrices. The 
suggested method gives very good performance using a different number of 
training samples compared with six different common spatial patterns (CSP) 
methods using only two channels. 
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1. INTRODUCTION 

Brain activities of any person could be recorded using two types of methods non invasive or 
invasive. The non-invasive methods use channeles attached to the scalp of the user. Different channels of the 
recorded signal can be used to extract features by using different methods such as power of the signals 
directly or combined with other methods [1]. Spatial filters are one of theses features extraction methods. 
Ramoser et al. [2] used common spatial patterns (CSP) to extract motor imagery (MI) features. Conventional 
CSP is a user or session dependening method, where there is no information about other users or sessions that 
added to the training-dataset. Wang et al. [3] and Cao et al. [4] used the classic CSP in a hybrid wheelchair 
control system each one with different number of commands. 

Cerny and Stastny [5] applied CSP on finger movement instead of hand movement. Sun et al. [6] 
used CSP and support vector machine (SVM) in online experiment. Classic CSP gives poor classification 
accuracy when the training data set is small. Pan et al. [7] used the same principles of CSP but with the 
frequency domain signals extracted from only two channels named as common frequency pattern (CFP). 
Modifications and improvements are made to increase the classification accuracy. Xygonakis et al. [8] 
extracted the spatial filter from more than one region named as regions of interest. Park and Chung [9] 


Journal homepage: http://beei.org 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 O 3585 


divided the head area into sub-areas each one centered by a channel, they found the best region to find the 
spatial filter of CSP. 

Lv and Liu [10] selected the best channels to be used to find the spatial filter of CSP using binary particle 
swarm optimization (BPSO). Wang et al. [11] proposed a modification in the features extraction method of CSP 
and used it in frequency domain instead of the time domain to decrease the time consumed in extracting features. 
Zhang et al. [12] used wavelet analysis to enhance the signal while Gouy-Pailler et al. [13] used independent 
component analysis (ICA) before applying CSP. Mousavi et al. [14] extracted the frequency of interest using 
wavelet transfer analysis then used the signals to extract the spatial filter of CSP, while Robinson et al. [15] used 
only the useful components after clearing all the unneeded ones to use the reconstructed signal as input signal to 
CSP filter. Ang et al. [16] divided the frequency band into multiple frequency banks each one used to find the 
spatial filter and features then applied to different classifiers their method named as filter bank CSP (FBCSP), 
while Novi et al. [17] applied the sub-bands to the same classifier this method named as sub band CSP (SBCSP). 

Kumar and Sharma [18] used the meta-heuristic algorithms to find the best frequency range and 
band-pass filter order that will give the best accuracy of classification. Ge et al. [19] used fusion features time 
domain and CSP features. Song and Yoon [20] introduced adaptive CSP by measuring the similarity between 
the training data and the new data to improve the spatial filter. Higashi and Tanaka [21] optimized the 
parameters and window size that used in calculating CSP filter. To increase the separability a complexity 
index is used by Li et al. [22] in calculating CSP spatial filter. To increase the robustness of the features 
Samek et al. [23] combined CSP with Tikhonov regularization method to produce stationary measure using 
CSP method. 

To improve the accuracy of the classification and to overcome the lack of training-dataset problem, 
some researchers proposed a regularized CSP (RCSP) method. The regularization is done in two ways either 
on the objective functions or on the covariance matrices [24]. The regularization of covariance matricies is 
done by adding more information from other subjects or sessions beside the target ones. Lu et al. [25] 
proposed the RCSP with generic learning (RCSPGL). RCSPGL uses data from a subjects’ population to 
increase the training dataset size. Lu et al. [26] used aggregation to choose the shrinking parameters of 
RCSPGL. Shin et al. [27] used RCSP method to classify the human moods. Yuksel and Olmez [28] 
emphased the channels that are close to the imagery area using spatially regularizing CSP method. 

Li et al. [29] improved RCSP method by using statistical dependency method to extract features 
instead of variances. Lotte and Guan [24] proposed four different features extraction algorithms one of them 
is the RCSP with selected subject (SSRCSP). SSRCSP uses data from selected subjects or session, not from 
all of the population. Lotte and Guan [30] regularized the objective function using Laplacian penalty that 
produce smooth spatial filter. Park et al. [31] used the same method as FBCSP but with RCSP instead of 
CSP. Li and Wang [32] introduced two smoothing algorithms, the first used Gaussian prior and the other 
used ridge penalty function. Both are used to RCSP method in the objective function. 

Cheng et al. [33] developed subject to subject transfer learning by RCSP, by finding the weights of 
the most similar subject to calculate the spatial filter. Alhakeem and Ali [34] used a combination of both 
RCSP and ICA to improve the accuracy of session to session transfer learning of MI data. Xu et al. [35] used 
cosine similarities between the source and the target subject, they proposed iteration CSP method to find the 
most suitable subject to fine the spatial filter. Kang et al. [36] developed two methods to find the composite 
CSP (CCSP) to increase the accuracy percentage of the classifiers. An adaptive session to session extreme 
learning machine is proposed by Bamdadian et al. [37] to improve the classification accuracy. Cho et al. [38] 
studied the effect of the background noise, they found that removing the background noises while working 
will reduce the needed training time. 

Some researchers proposed algorithms based on CSP to classify multiclass rather than the previously 
mentioned ones which they are binary class only. Wu et al. [39] used multiple binary class CSP with one versus 
the rest algorithm to expand the binary class CSP to multiclass one. Grosse-Wentrup and Buss [40] used joint 
approximate diagonalization (JAD) with CSP to obtain multiclass CSP. Sun proposed a multiclass CSP by 
dividing the frequency range in to sub-ranges then find the spatial filter of multiple binary problems for each 
sub-range [41]. In one subject brain computer interface (BCI) applications such as wheelchair control, there is 
different recording sessions could be used as a regularizing dataset to increase the training trials numbers instead 
of using one day recordings. The proposed method uses four tuning parameters that represented the ratios of the 
covariance matrices that are used in spatial filters extraction. 


2. RESEARCH METHOD AND DATASET 
2.1. Four-parameter regularized common spatial pattern 

There are many types of CSP methods starting from the traditional [2], this type depend only on the 
training set of the target no information of others or the old state of training. Two methods are proposed [34] 
named as CCSP, this method used two different ways to calculate the weights that refer to the importance of the 
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covariance matrix. Other methods are proposed to regularize the covariance matrices each in its own way. 
RCSPGL [25] in this method a population of training data from other users are used and two shrinking 
parameters that will refer to the ratios of the covariance matrices. While the SSRCSP method that proposed [24] 
it works by selecting the most fitted subject for training even if there is a big group of subjects or enough data for 
training. This process is done by discarding the subjects that give useless information and keep the useful ones. 

In the proposed method the covariance matrices are determined just like the traditional CSP method. 
As the following, the normalized covariance matrix is determined from the data matrix E, which is here the 
data of different channels over time, for each class i and trial j: 


E 


ees a ee 1 
Y trace(EijET ) ( ) 


EijE 


For N training trials the average covariance matrix is of class i: 
—iyv 
CN; = z dja Ci (2) 


where N is the training trials number, I is 1, or 2 depending on the choosen class (1 or 2), j is 1, ..., N 
(number of trails), trace: is the summation of the diagonal elements in square matrix. 
The covariance matrices are as: 


CN, = a, CNictarget) + a CNictraining) (3) 
CN, = a3 CN; + a,l (4) 


where a4, a2, a3, and a, are the tuning parameters, all of them belong to the period [0, 1] with a condition 
which is all parameters could not be zero at the same time. This condition is made to prevent the matrices 
from vanishing by multiplying them by a zero parameter. 

The first phase has two parameters a, and a, as shown in (3), they will define the ratio of the 
covariance matrix whether it go forward the training covariance matrix or to the target one. These parameters 
are not complemented and independent of each other so shrinking one matrix will not lead to enlarge the 
other and decrease its effect on the classification accuracy. The second phase in (4) has two parameters also 
these will give us two matrices one is a ratio of the identity matrix and the other is a ratio of the first phase 
resultant matrix, the parameters are not the complement of each other as we mentioned before and not 
dependent on the first two parameters. This condition will give us four matrices that each one has a portion of 
the target user, the other users and the identity matrix all that will give better extraction of different features 
to be classified later. To prevent the matrices from vanishing no two parameters in the same phase should be 
zero at the same time. After calculating the covariance matrices, the composite covariance matrix for two 
classes defined as (5): 


sum = ¥:2_, CN, (5) 
Then sum is factorized as (6): 
sum = UAUT (6) 


The whitening transformation is determined as (7): 


1 


P= 


ut (7) 
The whitened sum; is as (8): 
sum, = P CN,P™ (8) 


The factorization of is as (9): 


am = BA,BT (9) 
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The full projection matrix is as (10): 

W =B"P (10) 
Select the first and last Q columns to produce the projection matrix W: 

f=WE (11) 
The features are: 


a var(f) 
f = log yar) (12) 


2.2. Support vector machine classifier [42] 

SVM is one of the most popular classifiers. In this classifier there is a separation hyper-plane 
separate the members of two classes, Figure 1 shows the details of the SVM classifier. In linearly separable 
cases SVM maximize the distance between margins. This is formulated in (13): 


yi(xı—w +b)-— 1 > 0, for alli (13) 


where x; is the features and y; is the classes of them. Linear Kernel SVM classifier is used in this work, this 
Kernel is suitable for the number of features and number of training trails that used. 
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Figure 1. SVM classifier (separable case) 


2.3. Datasets 
2.3.1. Dataset_I 

This dataset is provided by Dr Cichocki's Lab [43] which is recorded for MI signal, multi-user, and 
multiple-sessions for one user. This dataset is recorded by using two recording devices which they are 
(Neuroscan and g.tec). The sampling frequencies of them are 256 and 250 Hz respectively. The data is 
recorded using different number of channels which they are five (C3, C43, Cp3, Cp4, and Cz), six 
(C3, Cp3, Cz, Cpz, C4, and Fp4) and fourteen channels (C1, C2, C3, C4, C5, C6, Cz, Cpz, Cp1, Cp2, Cp3, 
Fp1, Fp2, and Fp3). Three classes are left-hand, right-hand, and both feet are used to record data for eight 
different healthy subjects mentioned in the dataset as (a, b, c, d, e, and h). In this work, only two classes are 
used imagining the movement of left and right hands, if the data recorded for three classes, we use left-hand 
and right-hand classes only. Our method focused only on signals from two channels C3 and C4 which they 
are close to the motor cortex area in the brain, therefore, the signals from the other channels are ignored. 


2.3.2. Dataset_II 

This dataset is recorded by the one of authors with three volunteers, who they are healthy people. 
Each one labelled by the first letter of the his/her name (A, H, and Z) all of them have no mental disease and 
record these signals for the first time they all have healthy or corrected eyesight. The same recording 
protocol [43] is used also. Two classes are recorded which they are the left-hand class and right-hand class. 
The experiment take place in an ordinary room environment using OPENBCTI recording device, 16-channels 
which they are (C3, C4, Cz, O1, O2, Oz, P3, P4, Pz, T7, T8, F3, F4, Fpl, Fp2, and Fpz) and 125 Hz as a 
sampling frequency. Multiple-sessions are recorded by each user, only C3 and C4 channels are used in the 
experiments and all other channels are dropped from the experiment. 
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2.4. Evaluation method 

The accuracy is the most suitable method to evaluate the performance of balanced datasets. The 
balance dataset means number of classes are equal for each one. The standard deviation is used to find the 
divergence of the results from their average: 


number of true classified samples 


Accuracy = (14) 


total umber of samples 


N wey 
Standard Deviation = _|2=2@i0" (15) 
N-1 


where N is the number of samples, xi is the tested sample and x is the average of the samples. 


3. RESULTS AND DISCUSSION 

FPRCSP method is used to extract the features from both datasets-I and dataset-II that previously 
discussed in subsection 2.3. As this work studies the effects of adding new recorded sessions and the number 
of them on the accuracy of classification. Linear Kernel SVM is used as a classifier, because of the small 
features vector, no need to complex Kernel of SVM classifier. The average accuracy is taken for ten runs to 
overcome any randomness in initial values, the standard deviation of the runs is taken too. The data set silted 
in to training set and testing set each one is different from the other. As a data preprocessing step, the signals 
are filtered using band-pass filter in the range of alpha and beta activity frequency range of the brain, which 
is 7-30 Hz. All the tuning parameters that used in FPRCSP which they are (a4, a2, a3, and a,) generated by 
using simple loops within the period [0,1] with step =0.1. 

To show the results of changing the sizes of training-set and the effect of adding new session on the 
accuracy, subject-c/dataset-I is used in this experiment. Seven sessions are recorded each in different day are 
used to show the effect of adding new sessions to the training-set. The sessions are divided as the following 
the new session is always the target one and the old are the generic (e.g., session cl-4 means that the first 
three sessions (1, 2, and 3) are the generic training-data and the fourth one (session 4) is the target session). 
Figure 2 shows that when the sessions are few (from two to four sessions) the number of trails will not 
improve the classification accuracy unless it is relatively high (40 to 50 trails). This means that the subject 
did not train well to generate a powerful signal to be classified using low number of trails. After the fifth day 
of recording the accuracy starting to improve to reach more than 80% using whatever the number of trials 
that used. Now what is the effect one the training-set size? The answer is that, the training set size equals to 
number of trails (N)xnumber of sessions (i.e., if there are five sessions each one of N=50 the training-set size 
is 50x5=250 trails) when the data increases the old data will be unuseful and could be discarded to keep the 
training-set updated and as small as possible. 

To compare the performance of the proposed method, the results of FPRCSP are compared with 
seven other methods which are (CSP, CCSP1, CCSP2, RCSPDL, RCSPGL, SSRCSP, and ICA_RCSP). The 
results of this comparison are shown in Tables 1-3, the comparison is done with other methods using 
different number of trails. Some results appeared as NaN which means that there is a zero division is some 
cases FPRCSP is not suffer from dividing by zero problem because there is no parameter in the denominator 
could be zero at all. From Tables 1 and 2 it is obvious that FPRCSP method give better results after adding 
the data of four days and above and the results of the classification using different number of trails per 
session which they are in the tables (12 and 20 trails), besides that the average classification accuracy is the 
best amount the other methods. 

In Table 3 the number of trails is 50 from each session, because the data starting to be enough to 
extract good features from only two sessions and above, the accuracy increased from about 80 to 92%. Fifty 
trails from each session are not very big number of trails that will make the user tried to reduce the power of 
the recorded signals and will not make the dataset very bit to be unseparable using SVM classifier. The 
overall number of trails will be is the seven sessions are used is 50x7=350 trail only giving 92% 
classification accuracy. 

The results of subject c/dataset I and subject z/dataset II are shown in Figure 3, the data are recorded 
during different 5 sessions for only two classes. The comparison is done among different CSP methods using 
number of training trails (N=40) from each session. FPRCSP method still has the best result among all other 
methods that compared with. Table 4 shows the results of subject A/dataset II the comparison is done for 
(N=20) trials per session. Both subjects A and subject Z have recorded the signals for the first time, both 
have no experience of motor imagination. Figure 4 shows the result of testing FPRCSP method using 
subject-to-subject transfer learning using (N=50) trails per subject each one is the target subject and the other 
are the rest populations. 
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Figure 2. A comparison according to the classification accuracy of FPRCSP method using different 


training datasets’ sizes of subject_c/dataset_I 


Table 1. A comparison according to classification accuracy of subject_c/dataset_I when using 12 trails from 
each session 


Method CSP CCSPI CCSP2 RCSPDL RCSPGL SSRCSP ICA_RCSP FRCSP 
Session method method method method method method method method 
Sessions cl_2 54.347 71.014 71.014 53.623 71.014 71.014 67.173 61.594 
Sessions cl_3 46.296 NaN NaN 65.740 NaN NaN 55.555 63.888 
Sessions cl_4 20.370 50 50.925 50 50.925 NaN 62.129 57.407 
Sessions cl_5 58.333 75.694 76.388 67.361 81.944 75.694 82.152 84.722 
Sessions cl_6 85.227 NAN NaN 88.636 NaN NaN 83.181 89.772 
Sessions cl_7 88.888 91.666 62.037 63.888 91.666 NaN 78.703 86.111 
Average accuracy 58.910 72.093 65.091 64.875 73.8878 73.354 71.482 73.916 
Standard deviation 23.298 14.877 9.6528 12.3853 15.136 2.3399 10.5085 13.177 


Table 2. A comparison according to classification accuracy of subject_c/dataset_I when using 20 trails from 
each session 


Method CSP CCSPI CCSP2 RCSPDL RCSPGL SSRCSP JICA_RCSP  FRCSP 
Session method method method method method method method method 
Sessions cl_2 51.538 52.307 52.307 53.076 52.307 52.307 65.384 60.7692 
Sessions cl_3 42 NAN NaN 44 NaN NaN 50 67 
Sessions cl_4 50 50 50 50 50 NaN 57 73 
Sessions c1_5 31.617 74.264 77.205 69.852 77.205 77.205 74.558 83.8235 
Sessions cl_6 87.50 NaN NaN 90 NaN NaN 76.25 88.75 
Sessions cl_7 82 83 63 83 92 NaN 85.10 88 
Average accuracy 57.4426 64.893 60.628 64.988 67.878 64.756 68.048 76.8904 
Standard deviation 20.4212 14.105 10.7544 17.2274 17.5423 12.449 11.9395 10.6847 


Table 3. A comparison according to classification accuracy of subject_c/dataset_I when using 50 trails from 
each session 


Method CSP CCSPI CCSP2 RCSPDL RCSPGL SSRCSP ICA_RCSP FRCSP 
Session method method method method method method method method 
cl_2 67 71 JA: 79 77 77 54.30 68 
cl_3 48.571 55.714 55.714 54.285 55.714 58.571 17.428 80 
cl_4 81.428 82.857 82.857 82.857 82.857 82.857 40 82.857 
cl_5 83.962 92.452 92.452 80.188 92.452 92.452 49.528 91.509 
cl_6 86 90 90 90 90 NaN 62 82 
cl_7 91.428 95.714 94.285 95.714 95.714 94.285 49.428 92.857 
Average 76.3984 82.2897 82.0516 80.3409 82.2897 81.033 45.447 82.870 
Standard deviation 14.5102 13.4009 13.1710 13.0170 13.4009 12.8853 14.139 8.21068 


Session to session transfer learning using regularized four parameters common ... (Zaineb M. Alhakeem) 


3590 O ISSN: 2302-9285 


! 
| 


CSP CCSP1 CCSP2 RCSPDL RCSPGL SSRCSP FRCSP 


70 


60 


50 


8 Dataset | 


D 
(=) 


+: Dataset III/a 


Accuracy 


w 
=] 


20 


10 


Figure 3. A comparison according to classification accuracy of subject_c/dataset_I and subject when using 40 
trails from each session 


Table 4. A comparison among different CSP methods using subject_a/dataset_II with N=20 


Method CSP CCSP1 CCSP2 RCSPDL RCSPGL SSRCSP FPRCSP 
Session method method method method method method method 
Sessions Al_2 43,333 50 50 53.333 53.333 50 56.667 
Sessions Al_3 46.666 50 50 50 50 50 50 
Accuracy average 45 50 50 51.6665 51.6665 50 53.335 
Standard deviation 1.666 0 0 1.6665 1.6665 0 3.335 
80 
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Figure 4. A comparison among different CSP methods using the data of 6 subjects from dataset_I and 
3 subjects from dataset_II with N=50 according to accuracy 


4. CONCLUSION 

Transfer learning such as (session-to-session) is very important when a single user BCI applications, 
when there is no enough data to train the model, such applications are used by only one person so no need to 
train the model using other users’ data such as wheelchairs control. FPRCSP method based on regularizing 
the subjects’ data on different sessions, because the mode and attention the subject differ from time to time 
and from day to day. To achieve good performance according to the accuracy of classification 
electroencephalography (EEG) signals over multi-sessions for single user or multi-user in different sessions, 
tuning parameters are used to take ratios from different covariance matrices, four of them are used in this 
work. FPRCSP is applied to both subject-to-subject and session-to-session transfer learning. 

Although the results of (session-to-session) transfer learning is better than (subject-to-subject) 
transfer learning according to the classification accuracy, the results of inter-subject are acceptable also as it 
is shown previously. Some methods such as (CSP, CCSP1, CCSP2, RCSPDL, RCSPGL, SSRCSP, and 
ICA_RCSP) are special cases of FPRCSP so no need to take time to choose one of these methods in feature 
extraction phase instead using FPRCSP will do the work in most cases. Adding a fresh recording session will 
increase the accuracy of the classification. For future work a method should be found to exactly choose the 
tuning parameters. 
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